ISO 9660
Filesystems |
---|
Virtual Filesystems |
Disk Filesystems |
CD/DVD Filesystems |
Network Filesystems |
Flash Filesystems |
ISO 9660 is the standard file system for CD-ROMs. It is also widely used on DVD and BD media and may as well be present on USB sticks or hard disks. Its specifications are available for free under the name ECMA-119.
Overview and caveats
ISO 9660 is not a complex file system, but has a few quirks that are worth remembering. It seems that some operating systems also create non-compliant CDs, so beware! The main example of this is the character set that is available for file names. Strictly, filenames may only consist of uppercase letters A-Z, digits, dots, and underscores. Further there is a semicolon which separates the visible file name from its version number suffix. Many operating systems also allow lower case letters and other characters. Linux's VFS displays lower case filenames to the user despite the CD contents actually containing upper case characters.
Sector size
An ISO 9660 sector is normally 2 KiB long. Although the specification allows for alternative sector sizes, you will rarely find anything other than 2 KiB.
Numerical formats
Another quirk of the system is that it has several numbering formats and multi-byte numbers are often represented in both-endian format. The ISO 9660 standard specifies three ways to encode 16 and 32-bit integers, using either little-endian (least-significant byte first), big-endian (most-significant byte first), or a combination of both (little-endian followed by big-endian). Both-endian (LSB-MSB) fields are therefore twice as wide. For this reason, 32-bit LBA's often appear as 8 byte fields. Where a both-endian format is present, the x86 architecture makes use of the first little-endian sequence and ignores the big-endian sequence.
Encoding | Description |
---|---|
int8 | Unsigned 8-bit integer. |
sint8 | Signed 8-bit integer. |
int16_LSB | Little-endian encoded unsigned 16-bit integer. |
int16_MSB | Big-endian encoded unsigned 16-bit integer. |
int16_LSB-MSB | Little-endian followed by big-endian encoded unsigned 16-bit integer. |
sint16_LSB | Little-endian encoded signed 16-bit integer. |
sint16_MSB | Big-endian encoded signed 16-bit integer. |
sint16_LSB-MSB | Little-endian followed by big-endian encoded signed 16-bit integer. |
int32_LSB | Little-endian encoded unsigned 32-bit integer. |
int32_MSB | Big-endian encoded unsigned 32-bit integer. |
int32_LSB-MSB | Little-endian followed by big-endian encoded unsigned 32-bit integer. |
sint32_LSB | Little-endian encoded signed 32-bit integer. |
sint32_MSB | Big-endian encoded signed 32-bit integer. |
sint32_LSB-MSB | Little-endian followed by big-endian encoded signed 32-bit integer. |
Date/time format
The date/time format used in the Primary Volume Descriptor is denoted as dec-datetime and uses ASCII digits to represent the main parts of the date/time:
Offset | Size | Datatype | Description |
---|---|---|---|
0 | 4 | strD | Year from 1 to 9999. |
4 | 2 | strD | Month from 1 to 12. |
6 | 2 | strD | Day from 1 to 31. |
8 | 2 | strD | Hour from 0 to 23. |
10 | 2 | strD | Minute from 0 to 59. |
12 | 2 | strD | Second from 0 to 59. |
14 | 2 | strD | Hundredths of a second from 0 to 99. |
16 | 1 | int8 | Time zone offset from GMT in 15 minute intervals, starting at interval -48 (west) and running up to interval 52 (east). So value 0 indicates interval -48 which equals GMT-12 hours, and value 100 indicates interval 52 which equals GMT+13 hours. |
All fields except for the offset from GMT are in ASCII digits. When the date and time is not specified, all string fields are ASCII '0' (for a total of 16 ASCII zeroes) and the last field is binary zero.
String format
Character strings are encoded with ASCII encoding. The specification does not permit all characters. It defines two sets of characters: 'a-characters' and 'd-characters'. You will see these terms used in the descriptor tables throughout this article. The character sets are:
a-characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ ! " % & ' ( ) * + , - . / : ; < = > ?
d-characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _
Encoding | Description |
---|---|
strA | String with only ASCII a-characters, padded to the right with spaces. |
strD | String with only ASCII d-characters, padded to the right with spaces. |
Note that not all CDs strictly adhere to the character sets specified in ISO 9660.
Filenames
Filenames must use d-character encoding (strD), plus dot and semicolon which have to occur exactly once per filename. Filenames are composed of a File Name, a dot, a File Name Extension, a semicolon; and a version number in decimal digits. The latter two are usually not displayed to the user.
There are three Levels of Interchange defined. Level 1 allows filenames with a File Name length of 8 and an extension length of 3 (like MS-DOS). Levels 2 and 3 allow File Name and File Name Extension to have a combined length of up to 30 characters.
The ECMA-119 Directory Record format can hold composed names of up to 222 characters. This would violate the specs but must nevertheless be handled by a reader of the filesystem.
Size Limitations
ISO 9660 filesystems can have up to 2 exp 32 blocks, i.e. 8 TiB. Normally they will be restricted to the size of optical media. (Currently up to 100 GiB with 4-layer BD-R.)
The maximum size of data files depends on the Level of Interchange that is intended for the ISO filesystem. Levels 1 and 2 allow for 4 GiB - 1, because a single Directory Record can claim up to that number of bytes. Level 3 allows to have multiple consequtive Directory Records with the same name. They all are to be concatenated to a single data file. This means that a single data file can nearly fill up the full 8 TiB of image size.
System Area
An ISO 9660 filesystem begins by 32 KiB which may be used for arbitrary data. This is often used to store boot information for the case that the ISO 9660 filesystem is not stored on optical media, but rather on a hard-disk-like device, e.g. on a USB stick.
So be prepared to find at that location a Master Boot Record (MBR, for BIOS), a GUID Partition Table (GPT, for EFI), or an Apple Partition Map (APM).
Volume Descriptors
When preparing to mount a CD, your first action will be reading the volume descriptors (specifically, you will be looking for the Primary Volume Descriptor).
Since sectors 0x00-0x0F of the CD are reserved as System Area, the Volume Descriptors can be found starting at sector 0x10 (16). The format of the volume descriptors is as follows:
Offset | Length (bytes) | Field name | Datatype | Description |
---|---|---|---|---|
0 | 1 | Type | int8 | Volume Descriptor type code (see below). |
1 | 5 | Identifier | strA | Always 'CD001'. |
6 | 1 | Version | int8 | Volume Descriptor Version (0x01). |
7 | 2041 | Data | - | Depends on the volume descriptor type. |
This means that each volume descriptor is therefore one sector (2 KiB) long.
Volume Descriptor Type Codes
The Volume Descriptor Type field specifies the type of Volume Descriptor:
Value | Description |
---|---|
0 | Boot Record |
1 | Primary Volume Descriptor |
2 | Supplementary Volume Descriptor |
3 | Volume Partition Descriptor |
4-254 | Reserved |
255 | Volume Descriptor Set Terminator |
When starting out with a basic CD, we are going to be interested in the Primary Volume Descriptor, which points us to the root directory and path tables, which both allow us to find any file on the CD. Using the path table is ideal for minimal implementations which do not wish to search the directory hierarchy node by node. This is slower (string comparisons across the entire file system) but easier to implement.
The Boot Record
The first type of Volume Descriptor is the "Boot Record". The descriptor format is as follows:
Offset | Length (bytes) | Field name | Datatype | Description |
---|---|---|---|---|
0 | 1 | Type | int8 | Zero indicates a boot record. |
1 | 5 | Identifier | strA | Always "CD001". |
6 | 1 | Version | int8 | Volume Descriptor Version (0x01). |
7 | 32 | Boot System Identifier | strA | ID of the system which can act on and boot the system from the boot record. |
39 | 32 | Boot Identifier | strA | Identification of the boot system defined in the rest of this descriptor. |
71 | 1977 | Boot System Use | - | Custom - used by the boot system. |
The most common Boot System Use specification is El Torito. It records at bytes 71 to 74 as little-endian 32-bit number the block address of the El Torito Boot Catalog. This catalog lists the available boot images, which serve as starting points of booting systems.
The Primary Volume Descriptor
This is a lengthy descriptor, but it contains some very useful information for reading the rest of the file system.
Offset | Length (bytes) | Field name | Datatype | Description |
---|---|---|---|---|
0 | 1 | Type Code | int8 | Always 0x01 for a Primary Volume Descriptor. |
1 | 5 | Standard Identifier | strA | Always 'CD001'. |
6 | 1 | Version | int8 | Always 0x01. |
7 | 1 | Unused | - | Always 0x00. |
8 | 32 | System Identifier | strA | The name of the system that can act upon sectors 0x00-0x0F for the volume. |
40 | 32 | Volume Identifier | strD | Identification of this volume. |
72 | 8 | Unused Field | - | All zeroes. |
80 | 8 | Volume Space Size | int32_LSB-MSB | Number of Logical Blocks in which the volume is recorded. |
88 | 32 | Unused Field | - | All zeroes. |
120 | 4 | Volume Set Size | int16_LSB-MSB | The size of the set in this logical volume (number of disks). |
124 | 4 | Volume Sequence Number | int16_LSB-MSB | The number of this disk in the Volume Set. |
128 | 4 | Logical Block Size | int16_LSB-MSB | The size in bytes of a logical block. NB: This means that a logical block on a CD could be something other than 2 KiB! |
132 | 8 | Path Table Size | int32_LSB-MSB | The size in bytes of the path table. |
140 | 4 | Location of Type-L Path Table | int32_LSB | LBA location of the path table. The path table pointed to contains only little-endian values. |
144 | 4 | Location of the Optional Type-L Path Table | int32_LSB | LBA location of the optional path table. The path table pointed to contains only little-endian values. Zero means that no optional path table exists. |
148 | 4 | Location of Type-M Path Table | int32_MSB | LBA location of the path table. The path table pointed to contains only big-endian values. |
152 | 4 | Location of Optional Type-M Path Table | int32_MSB | LBA location of the optional path table. The path table pointed to contains only big-endian values. Zero means that no optional path table exists. |
156 | 34 | Directory entry for the root directory | - | Note that this is not an LBA address, it is the actual Directory Record, which contains a single byte Directory Identifier (0x00), hence the fixed 34 byte size. |
190 | 128 | Volume Set Identifier | strD | Identifier of the volume set of which this volume is a member. |
318 | 128 | Publisher Identifier | strA | The volume publisher. For extended publisher information, the first byte should be 0x5F, followed by the filename of a file in the root directory. If not specified, all bytes should be 0x20. |
446 | 128 | Data Preparer Identifier | strA | The identifier of the person(s) who prepared the data for this volume. For extended preparation information, the first byte should be 0x5F, followed by the filename of a file in the root directory. If not specified, all bytes should be 0x20. |
574 | 128 | Application Identifier | strA | Identifies how the data are recorded on this volume. For extended information, the first byte should be 0x5F, followed by the filename of a file in the root directory. If not specified, all bytes should be 0x20. |
702 | 37 | Copyright File Identifier | strD | Filename of a file in the root directory that contains copyright information for this volume set. If not specified, all bytes should be 0x20. |
739 | 37 | Abstract File Identifier | strD | Filename of a file in the root directory that contains abstract information for this volume set. If not specified, all bytes should be 0x20. |
776 | 37 | Bibliographic File Identifier | strD | Filename of a file in the root directory that contains bibliographic information for this volume set. If not specified, all bytes should be 0x20. |
813 | 17 | Volume Creation Date and Time | dec-datetime | The date and time of when the volume was created. |
830 | 17 | Volume Modification Date and Time | dec-datetime | The date and time of when the volume was modified. |
847 | 17 | Volume Expiration Date and Time | dec-datetime | The date and time after which this volume is considered to be obsolete. If not specified, then the volume is never considered to be obsolete. |
864 | 17 | Volume Effective Date and Time | dec-datetime | The date and time after which the volume may be used. If not specified, the volume may be used immediately. |
881 | 1 | File Structure Version | int8 | The directory records and path table version (always 0x01). |
882 | 1 | Unused | - | Always 0x00. |
883 | 512 | Application Used | - | Contents not defined by ISO 9660. |
1395 | 653 | Reserved | - | Reserved by ISO. |
Volume Descriptor Set Terminator
The Volume Descriptor Set Terminator does not currently define bytes 7-2047 of its Volume Descriptor. This means that the only fields in use for the volume set terminator are the type code (255), the standard identifier ('CD001') and the descriptor version (0x01).
Offset | Length (bytes) | Field name | Datatype | Description |
---|---|---|---|---|
0 | 1 | Type | int8 | 255 indicates a Volume Descriptor Set Terminator. |
1 | 5 | Identifier | strA | Always "CD001". |
6 | 1 | Version | int8 | Volume Descriptor Version (0x01). |
The Path Table
The Path Table contains a well-ordered sequence of records describing every directory extent on the CD. There are some exceptions with this: the Path Table can only contain 65536 records, due to the length of the "Parent Directory Number" field. If there are more than this number of directories on the disc, some CD authoring software will ignore this limit and create a non-compliant CD (this applies to some earlier versions of Nero, for example). If your file system uses the path table, you should be aware of this possibility. Windows uses the Path Table and will fail with such non-compliant CD's (additional nodes exist but appear as zero-byte). Linux, which uses the directory tables is not affected by this issue.
The location of the path tables can be found in the Primary Volume Descriptor. There are two table types - the L-Path table (relevant to x86) and the M-Path table. The only difference between these two tables is that multi-byte values in the L-Table are LSB-first and the values in the M-Table are MSB-first.
The structure of a Path Table Entry is as follows:
Offset | Size | Description |
---|---|---|
0 | 1 | Length of Directory Identifier |
1 | 1 | Extended Attribute Record Length |
2 | 4 | Location of Extent (LBA). This is in a different format depending on whether this is the L-Table or M-Table (see explanation above). |
6 | 2 | Directory number of parent directory (an index in to the path table). This is the field that limits the table to 65536 records. |
8 | (variable) | Directory Identifier (name) in d-characters. |
(variable) | 1 | Padding Field - contains a zero if the Length of Directory Identifier field is odd, not present otherwise. This means that each table entry will always start on an even byte number. |
The path table is in ascending order of directory level and is alphabetically sorted within each directory level.
Directories
At some point when reading from an ISO 9660 CD, you will need a directory record to locate a file, even if you generally use the path table to locate the directory initially. Unlike the path tables, there is only one version of each directory table, and multi byte numbers are in both-endian format. Every directory will start with 2 special entries: an empty string, describing the "." entry, and the string "\1" describing the ".." entry. A directory record is laid out as follows:
Offset | Size | Type | Description |
---|---|---|---|
0 | 1 | int8 | Length of Directory Record. |
1 | 1 | int8 | Extended Attribute Record length. |
2 | 8 | int32_LSB-MSB | Location of extent (LBA) in both-endian format. |
10 | 8 | int32_LSB_MSB | Data length (size of extent) in both-endian format. |
18 | 7 | see format below | Recording date and time. |
25 | 1 | see below | File flags. |
26 | 1 | int8 | File unit size for files recorded in interleaved mode, zero otherwise. |
27 | 1 | int8 | Interleave gap size for files recorded in interleaved mode, zero otherwise. |
28 | 4 | int16_LSB-MSB | Volume sequence number - the volume that this extent is recorded on, in 16 bit both-endian format. |
32 | 1 | int8 | Length of file identifier (file name). This terminates with a ';' character followed by the file ID number in ASCII coded decimal ('1'). |
33 | (variable) | strD | File identifier. |
(variable) | 1 | -- | Padding field - zero if length of file identifier is even, otherwise, this field is not present. This means that a directory entry will always start on an even byte number. |
(variable) | (variable) | -- |
System Use - The remaining bytes up to the maximum record size of 255 may be used for extensions of ISO 9660. The most common one is the System Use Share Protocol (SUSP) and its application, the Rock Ridge Interchange Protocol (RRIP). |
Even if a directory spans multiple sectors, the directory entries are not permitted to cross the sector boundary (unlike the path table). Where there is not enough space to record an entire directory entry at the end of a sector, that sector is zero-padded and the next consecutive sector is used. Some of the above fields need explanation. Unfortunately, the date/time format is different from that used in the Primary Volume Descriptor. The Date/Time format is:
Offset | Size | Description |
---|---|---|
0 | 1 | Number of years since 1900. |
1 | 1 | Month of the year from 1 to 12. |
2 | 1 | Day of the month from 1 to 31. |
3 | 1 | Hour of the day from 0 to 23. |
4 | 1 | Minute of the hour from 0 to 59. |
5 | 1 | Second of the minute from 0 to 59. |
6 | 1 | Offset from GMT in 15 minute intervals from -48 (West) to +52 (East). |
This is quite a contrast to the PVD which contains ASCII encoded decimal values, but this format is presumably used to save disc space over a large number of entries.
The other field that needs some explanation is the File Flags field. This is represented by one bit flags as follows:
Bit | Description |
---|---|
0 | If set, the existence of this file need not be made known to the user (basically a 'hidden' flag. |
1 | If set, this record describes a directory (in other words, it is a subdirectory extent). |
2 | If set, this file is an "Associated File". |
3 | If set, the extended attribute record contains information about the format of this file. |
4 | If set, owner and group permissions are set in the extended attribute record. |
5 & 6 | Reserved |
7 | If set, this is not the final directory record for this file (for files spanning several extents, for example files over 4GiB long. |
Locating Data on the CD
By now, you should be able to see that there are two main ways to navigate to a file record. You an either search the path table, or you can search the full directory structure. You may find it more convenient and faster to cache the path table, loading directories only when necessary.
Searching the Path Table
If you are using the Path Table method, you will still need to know about Directory Records to find the file you are looking for. Basically, you search the path in a reverse order, following the "Parent Directory" links in the Path Table. Once you have located the directory containing the file you want, load that Directory and scan it for the appropriate file name.
Recursing from the Root Directory
Alternatively, you can ignore the Path Table and just cache the root directory from the Primary Volume Descriptor. You then load each directory in turn. For example, for the path '/BOOT/MYLOADER/STAGE2.BIN'
- Read the PVD in to memory. Bytes 156-189 contain the root directory entry.
- Load the root directory by reading the LBA and Length values in this root directory entry.
- Scan the directory entry identifiers for 'BOOT;1'.
- If found, use the LBA and length values to load the 'BOOT' directory in to memory.
- Repeat steps 3 and 4 for the file identifier 'MYLOADER;1'.
- Scan the 'MYLOADER' directory for 'STAGE2.BIN;1'. If found, you can now use the LBA value to load your file in to memory.
Rock Ridge and Joliet
There are two enhancements for ISO 9660 which make it more suitable for the worlds of Unix and of MS-Windows. Both can be combined in the same filesystem. So the reader often has the choice between three file name spaces: Plain ISO, Rock Ridge, Joliet.
ISO and Rock Ridge will show the same tree of files but with different names. Joliet can show a completely different tree than ISO.
Rock Ridge allows for file names of up to 255 characters of 8 bit. Only the 0-byte and the slash ("/") may not be used. Further it adds the file attributes which are specified by POSIX (owner, group, permissions,...) and it allows for symbolic links.
Rock Ridge is an application of SUSP. It may be accompanied by other SUSP applications like zisofs (compression of data files, Linux specific), Apple ISO 9660 Extensions, Amiga AS entries, or Arbitrary Attribute Interchange Protocol (AAIP: Extended Attributes and ACLs). A reader of SUSP entries shall simply ignore all entry types which it does not expect.
Joliet was defined by Microsoft Inc. to allow for filenames with up to 64 UCS-2 characters (16 bit). It is implemented as separate tree of Directory Records which begins by a root record in a Supplementary Volume Descriptor. That descriptor is similar to a Primary Volume Descriptor, but has a Type Code of 2.
See Also
Articles
- El-Torito, a standard for creating bootable CD-ROMs
- Mkisofs, about ISO 9660 producing programs: mkisofs, genisoimage, xorriso
- Optical Drive, an overview about how to operate optical drives and media
External links
- ISO 9660 (ECMA-119) specification
- ISO 9960 on Wikipedia
- Boot entry points in ISO 9660 filesystems
- SUSP 1.12 (entries CE , PD , SP , ST , ER , ES)
- Rock Ridge: RRIP 1.12 (SUSP entries PX , PN , SL , NM , CL , PL , RE , TF , SF , obsolete: RR)
- Amiga SUSP entry AS
- libisofs SUSP application AAIP (SUSP entry AL)
- Joliet addon-on specifications